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Abstract 

In this paper we discuss a genetic version (GWA) of the Whitehead's 
algorithm, which is one of the basic algorithms in combinatorial group 
theory. It turns out that GWA is surprisingly fast and outperforms the 
standard Whitehead's algorithm in free groups of rank > 5. Experiment- 
ing with GWA we collected an interesting numerical data that clarifies 
the time-complexity of the Whitehead's Problem in general. These exper- 
iments led us to several mathematical conjectures. If confirmed they will 
shed light on hidden mechanisms of Whitehead Method and geometry of 
automorphic orbits in free groups. 



1 Introduction 

Genetic Algorithms have been introduced by J.H.Holland in 1975 Since then 
they have been successfully applied in solving a number of numerical and com- 
binatorial problems. In most cases genetic algorithms are used in optimization 
problems when searching for an optimal solution or its approximation (sec, for 
example, survey 

The first applications of genetic algorithms to abstract algebra appeared in 
[TT] and 22 J where we made some initial attempts to study Andrews-Curtis 
conjecture from computational view-point. In the present paper we discuss a 
genetic version of Whitehead algorithm, which is one of the basic algorithms 
in combinatorial group theory. It turns out that this Genetic Whitehead Al- 
gorithm (GWA) is surprisingly fast and outperforms the standard Whitehead 
algorithm in free groups of rank > 5. Experimenting with GWA we were able 
to collect an interesting numerical data which clarifies the time-complexity of 
Whitehead Problem in general. These experiments led us to several mathemat- 
ical conjectures which we stated at the end of the paper. If confirmed they will 
shed light on hidden mechanisms of Whitehead Method and geometry of auto- 
morphic orbits in free groups. Actually, the remarkable performance of GWA 
has initiated already investigation of automorphic orbits in free groups of rank 

2 Some of the conclusions that one can draw from our experiments are 
worth to be mentioned here. 

One unexpected outcome of our experiments is that the time complexity 
functions of Whitehead's algorithms in all their variations does not depend 
"essentially" on the length of the input words. We introduce a new type of size 
function (Whitehead's Complexity function) on input words which allows one to 
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measure adequately the time complexity of Whiteheads algorithms. This type of 
size functions is interesting in its own right, it makes possible to compare a given 
algorithm from a class of algorithms /C with the best possible non-deterministic 
algorithm in /C. 

This Whitehead's complexity function takes care of the observed phenomena 
that most of the words in a given free group are already Whitehead's minimal 
(have minimal length in their automorphic orbit). Such words have Whitehead's 
complexity and the Whitehead's descent algorithm is meaningless for such 
words. 

Another conclusion we made is that the actual generic (or average) time 
complexity of the Whitehead's descent algorithm (on non-minimal inputs, of 
course) is much less than of the standard Whitehead's algorithm. Moreover, 
it does not depend on the rank r of the ambient free group exponentially, 
though the standard one does. We believe that there exists a finite subset 
(of polynomial size in r) of elementary Whitehead's automorphisms in Fr for 
which the classical Whitehead's descent method does nor encounter any " picks" 
on the most inputs. 

Genetic Whitehead Algorithms GWA was designed and implemented in 
1999 and soon after some interesting facts transpired from experiments. But 
only recently an adequate group-theoretic language (average case complexity, 
generic elements, asymptotic probabilities on infinite groups) was developed 
which would allow one to describe the group-theoretic part of the observed phe- 
nomena. We refer to papers 0, 0, |S| for details. On the other hand, a 
rigorous theory of genetic algorithms is not developed yet up to the level which 
would explain fast performance of such heuristic algorithms as GWA. In fact, we 
believe that thorough investigation of particular genetic algorithms in abstract 
algebra might provide insight to a general theory of genetic algorithms. 

2 Whitehead's method 
2.1 Whitehead Theorem 

Let X = {xi, . . . , Xn} be a finite set and F = Fn{X) be a free group with a 
basis X. Put X^^ = {x^^\x e X}. We will represent elements of F by reduced 
words in the alphabet X^^ (i.e., words without subwords for any 

X € X). For a word u by |m| we denote the length of u, similarly, for a tuple 
U = {ui, . . . ,Uk) e F'' we denote by \U\ the total length \U\ = \ui\ + . . . + \uk\- 

For an automorphism of _F, and /c-tuples U — (ui, ...,Uk),V = [vi, Ufc) S 
F*' we write Uf = V\l Ui(p = Vi, i = 1, k. 

In 1936 J.H.C. Whitehead introduced the following algorithmic problem, 
which became a central problem of the theory of automorphisms of free groups 
[18. 

Problem W Given two tuples U,V F^ find out if there is an automor- 
phism if G Aut{F) such that U(p = V . 
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In the same paper he showed (using a topological argument) that this prob- 
lem can be solved algorithmically and suggested an algorithm to find such an 
automorphism ip (if it exists). To explain this method we need the following 
definition. An automorphism t G Aut(F) is called a Whitehead automorphism 
if it has one of the following types: 

1) t permutes elements in X^^; 

2) t takes each element x € X^^ to one of the elements x, xa, a~^x, or 
a~^xa, where x ^ a*^ and a G X^'^ is a fixed element. 

Denote by Sl„ — i^{F) the set of all Whitehead automorphisms of a given 
free group F = Fn{X). It follows from a result of Nielsen that fin generates 
AutiFniX)) [H. 

Let T be a subset of Aut{F). We say that tuples U,V F^ are T-equivalent 
{U V) if there exists a finite sequence ti, . . . ,tm (where ti e T^^) such 
that Uti . . .tm = V . The T-equivalence class of a tuple U is called the T-orbit 
OrbxiU) of U. If T generates Aut{Fn) then the equivalence class of a tuple U is 
called the orbit Orb{U) of U. Now Problem W can be stated as a membership 
problem for a given orbit Orb{U). By Umin we denote any tuple of minimal 
total length in the orbit Orb{U), and by Orbmin{U) - the set of all minimal 

tuples Urmn- 

It is convenient sometimes to look at Whitehead problem from graph-theoretic 
view-point. Denote by T{F,k,T) the following directed labelled graph: F'^ is 
the vertex set of F; two vertices U,V e F'' are connected by a directed edge from 
U toV with label teT if and only if Ut = V. We refer to TkiF) = r(F, k, fi) 
as to Whitehead graph of F. In the case when fc = 1 we write r(F) instead of 
ri(F). Obviously, V G Orb{U) if and only if U and V are in the same connected 
component of Vk{F). 

The following theorem is one of the fundamental results in combinatorial 
group theory. 

Theorem 1 (Whitehead \TBi). Let U,V e Fn{Xf and V G Orb{U). Then: 

1) if \U\ > \V\, then there exists t G fiji such that 

\U\>\Ut\; 

2) if \U\ = \V\, then there exist ti, . . . , t„i G fin such that 

Uti...t„, ^ V 
and \U\ = \Uti\ = \Utit2\ = ... = \Utit2...trn\ = \V\. 

In view of Theorem ^Problem W can be divided into two subproblems: 
Problem A For a tuple U G F'' find a sequence ti,. . . ,tra G 57„ such that 

Ut\ . . . tjyi — Umin- 

Problem B For tuples U,V <E F^ with 

\U\ = \Um^n\ - \Vm^n\ = \V\ 



Whitehead Method and Genetic Algorithms, A. D. Miasnil<ov, A. G. Myasnil<ov • 04/17/2003 



4 



find a sequence ti, . . . ,tm G f2„ such that Uti . . . i„j = V. 

Theorem n] gives a solution to the both problems above, and hence to Prob- 
lem W. 

2.2 Whitehead Algorithm 

The procedures described below give algorithmic solutions to the Problems A 
and B, together they are known as Whitehead Algorithm or Whitehead Method. 

2.2.1 Decision algorithm for Problem A 

Following Whitehead we describe below a deterministic decision algorithm for 
Problem A, we refer to this algorithm (and to various its modifications) as to 
DWA. This algorithm executes consequently the following 

"Elementary Length Reduction Routine" (ELR): 

Let U &F^. ELR finds t e r2„ with |f/t| < |C/| (if it exists). Namely, 
ELR performs the following search. For each t e 17„ compute the 
length of the tuple Ut until \U\ > \Ut\, then put ti ^ t,Ui ^ Uti 
and output Ui. Otherwise stop and output Umm — U. 

DWA performs ELR on U, then performs ELR on Ui, and so on, until a min- 
imal tuple Umin is found. We refer to algorithms of this type as to Whitehead's 
descent method with respect to the set fl„ . 

Clearly, there could be at most \U\ repetitions of ELR: 

\U\ > \Uti\ > ... > \Uti...ti\ = Urmn, 1<\U\. 

The sequence ti,. .. ,ti is a solution to Problem A. Notice, that the iteration 
procedure above simulates the classical gradient descent method {ti is the best 
direction from U, t2 is the best direction from Ui, and etc.). 

2.2.2 Decision algorithm for Problem B. 

Here we describe a deterministic decision algorithm for Problem B, which is also 
due to Whitehead. In the sequel we refer to this algorithm (and its variations) 
as to DWB. 

Let U,V e F^. DWB constructs Orbrmn{U) (as well as Orbrm„{V)) by 
repeating consequently the following 

"Local Search Routine" (LS): 

Let fin = {ti, . . . ,tm} and A be a finite graph with vertices from 
F'^. Given a vertex in A the local search at W results in a graph 
Aw which contains A. We define Aw recursively. Put Fq = A, and 
suppose that F^ has been already constructed. If \Uti+i \ — \U\ and 
Uti+i does not appear in F^ then add C/ii+i as a new vertex to F^, 
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Figure 1: Whitehead Method. 



also add a new edge from U to Uti+i with label ii+i, and denote 
the resulting graph by Fj+i. Otherwise, put Fj+i = Fj. The routine 
stops in m steps and results in a graph F^. Put Aw = F^. 

The construction of Orbmin{U) is a variation of the standard 

"Breadth-First Search Procedure" (BPS): 

Start with a graph Aq consisting of a single vertex U. Put Ai = 

(Ao)vK and "mark" the vertex U. If a graph A^ has been constructed, 
then take any unmarked vertex W in Aj within the shortest distance 
from U, put Aj+i = (Ai)vK, and mark the vertex W. 

Since Orbmin{U) is finite BPS terminates, say in I steps, where 



It is easy to see that A; is a tree, containing all vertices from Orbmin{U). This 
implies that V G Orhmin{U) if and only if e A;. Moreover, the unique path 
connecting U and V in A; is a shortest path between U and V in Orbmin{U), and 
the sequence of labels along this path is a sequence of Whitehead automorphisms 
(required in Problem B) that connects U and V inside Orb„iin{U) . 

From the computational view-point it is more efficient to start building max- 
imal trees in both graphs Orbmm{U) and Orbmin{V) simultaneously, until a 
common vertex occurs. 

2.3 Estimates for the time-complexity of the Whitehead's 
algorithms. 

2.3.1 Algorithm DWA. 

It is easy to see that transformations of the type 1) cannot reduce the total length 
of a tuple. Hence, to solve Problem A one needs only Whitehead automorphisms 
of the type 2). It is not hard to show that there are 



/ < \Orbmin{U)\\nn\ 



An = 2n4("-i) - 2n 
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non-trivial Whitehead automorphisms of the type 2) . 

In the worst-case scenario to perform ELR it requires An executions of the 
following 

Substitution Routine (SR): 

For a given automorphism t of the type 2) make a substitution x ~* 
xt for each occurrence of each x € X^^ in U, and then make all 
possible cancellations. 

Since the length of the word xt is at most 3 the time needed to perform this 
routine is bounded from above by c\U\, where c is a constant which does not 
depend on \U\ and the rank of F. Since DWA executes ELR at most \U\ times 
the time-complexity function of DWA is bounded from above by 

cA,|C/|2 = c(2n4"-i-2n)|C/|2, 

This bound depends exponentially on the rank n of the group F = Fn{X). For 
example, if A: = 1, n = 10, and \U\ = 100, the estimated number of steps for 
DWA is bounded above by 

c(20 • 4^ - 20)100^ > c(5 • 10^°). 

Whether this bound is tight in the worst case is an open question. In any event, 
computer experiments which we ran on a dual Pentium III, 700 Mhz processor 
computer with 1Gb memory show (see Table [HJ that the standard DWA cannot 
find Umin on almost all inputs U which are pseudo-randomly generated primitive 
elements of length more then 100 in the group i^io, while working non-stop for 
more than an hour. 

The accuracy of the bound depends on how many automorphisms from ri„ 
do reduce the length of a given input U. To this end, put 

LRiU)^{tenn I \ut\ < \U\} 

Now, the number of steps that ELR performs on a worst-case input U is bounded 
from above by 

max{A„ - \LR{U)\,l} 

(if the ordering of ri„ is such that all automorphisms from LR{U) are located 
at the end of the list fi^ = {ii, . . . , im})- 

If we assume that the automorphisms from LR(U) are distributed uniformly 
in the list fin then DWA needs 

A' - 



\LR{U)\ 

steps on average to find a length reducing automorphism for U . 

The results of our experiments (for fc = 1) indicate that the average value 
of \LR{U)\ for a non- minimal U of the total length I rapidly converges to a 
constant when I ^ oo. In Table and Figure |2 we present values of 

the -^^5^ that occur in our experiments for fc = 1. This allows us to make the 
following statement. 
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Conclusion 1 The average number of length reducing Whitehead's automor- 
phisms for a given "generic" non-minimal word w € Fn does not depend on the 
length of \w\, it depends only on the rank n of the free group (for sufficiently 
long words w ). 

A precise formulation of this statement is given in Sectional 



\w\ 


F2 


F3 


F4 


F5 


0..199 


0.24 


0.09 


0.04 


0.03 


200. .599 


0.24 


0.09 


0.05 


0.03 


600. .999 


0.24 


0.09 


0.04 


0.02 


1000. .1299 


0.25 


0.09 


0.04 


0.02 


1400 ... 1800 


0.24 


0.09 


0.04 


0.02 



0.3 



0.25 



0.2 



0.15 



0.1 




0.05 



Table 1: Estimates of -^^^ on inputs of various lengths. 
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Figure 2: Estimates of -^^j^ on inputs of various lengths. 



2.3.2 Algorithm DWB 

The obvious upper bound for the time-complexity of DWB is much higher, since 
one has to take into account all Whitehead automorphisms. It is easy to see 
that there are 

B„ = 2n(2n - 2)(2n - 4) ... 2 = 2"(n!) 
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Whitehead automorphisms of the type 1). 

To run LS routine on U it requires at most d{An + Bn) runs of SR (which 
has complexity c|J7|), where d is a constant which does not depend on U and n. 
Now, to construct Orbmin{U) it takes at most \Orbmin{U)\ runs of LS, hence 
one can bound the time complexity of DWA from above by 

d-{An + B,,)-c-\U\-\Orb^UU)\- 

This shows that DWB may be very slow (in the worst-case) just because there 
are too many Whitehead automorphisms in the rank n for big n. Moreover, the 
size of OrbminiU) can make the situation even worse. Obviously, 

|Or6™„(C/)| <2n(2n- 1)1^1-1, (1) 

hence a very rough estimates give the following upper bound for the time- 
complexity of DWB : 

d-c- (2n4("^i) - 271 + 2"?i!) • \U\ ■ 2n{2n - 1)1^1"^ 

One can try to improve on this upper bound through better estimates of \Orbmin {U) \ . 
It has been shown in that for A: = 1 and n = 2 the number \Orbmin{U)\ is 
bounded from above by a polynomial in |?7mm|. It was also conjectured in |E| 
that this result holds for arbitrary n > 2, and for n = 2 the upper bound is the 
following: 

\Orb„,,n{U)\ < 8|[/™„p +40|i7™„|. 

Recently, B.Khan proved in jHj that the bound above holds, indeed. Still, in- 
dependently of the size of the set OrbminiU), the number i3„ of elementary 
Whitehead automorphisms in rank n makes DWB impractical for sufficiently 
big n. 

The net outcome of the discussion above is that the algorithms DWA and 
DWB are intractable for "big" ranks, even though for a fixed rank n DWA is 
quadratic in \U\ and DWB could be polynomial in \U\ (if Conjecture |21 from 
Section El holds). 

2.4 General Length Reduction Problem. 

Observe that the main part of DWA is the elementary length reduction routine 
ELR, which for a given tuple U E finds a Whitehead automorphism G ^{F) 
such that 

\Uip\ < \U\ (2) 

An arbitrary automorphism ip G Aut{F) is called length-reducing for U if it 
satisfies the condition (O above. 

Obviously, to solve Problem A it suffices to find an arbitrary (not necessary 
Whitehead) length-reducing automorphism for a non-minimal tuple U. We 
have seen in Section that the time-complexity of the standard Whitehead 
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algorithm for Problem A depends mostly on the cardinality of the set which 
is huge for big n. One of the key ideas on improving the efficiency of Whitehead 
algorithms is to replace f2„ by another smaller set of automorphisms of F or to 
use a different strategy to find length-reducing automorphisms. To this end we 
formulate the following 

Length- Reduction Problem (LRP). For a non-minimal tuple U ^ F^ 
find a length-reducing automorphism. 

Theorem^gives one solution to LRP - the algorithm DWA. In SectionOlwe 
describe a genetic algorithm which, we believe, solves LRP much more efficiently 
on average then DWA. 

3 Description of the genetic algorithm 

In this section we describe Genetic Whitehead Algorithm (GWA) for solving 
Whitehead's Problem A. 

Genetic algorithms are stochastic search algorithms driven by a heuristic, 
which is represented by an evaluation function, and special random operators: 
crossover, mutation and selection. 

Let iS be a search space. We are looking for an element in S which is a 
solution to a given problem. A tuple P G .S'' (r is a fixed positive integer) is 
called a population and components of P are called members of the population. 
The initial population Pq is chosen randomly. On each iteration i = 1,2, .. . 
Genetic Algorithm produces a new population Pi by means of random operators. 
The goal is to produce a population which contains a solution to the problem. 
One iteration of Genetic Algorithm simulates natural evolution. A so-called 
fitness function Fit : S — > M_|- implicitly directs this evolution: members of the 
current population Pi with higher fitness value have more impact on generating 
the next population P^+i. The function Fit measures on how close is a given 
member m to a solution. To halt the algorithm one has to provide in advance a 
termination condition and check whether it holds or not on each iteration. The 
basic structure of the standard Genetic Algorithm is given in Figure O 

The choice of random operators and evaluating functions is crucial here. This 
requires some problem specific knowledge and a good deal of intuition. Below 
we give detailed description of the major components of the genetic algorithm 
GWA for solving Problem A. 

3.1 Solutions and members of the population 

Solutions to the Problem A are finite sequences of Whitehead automorphisms 
which carry a given tuple U G F*' to a minimal tuple Umin- As we have men- 
tioned above one may use only automorphisms of the type 2) for this problem. 
Moreover, not all automorphisms of the type 2) are needed as well (recall that 
a big number of such automorphisms is the main obstacle for the the standard 
Whitehead algorithm DWA). What are optimal sets of automorphisms is an 
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procedure Genetic Algorithm 

Initialize current population P ^ ; 

Compute fitness values Fit{m), Vto G P; 

WHILE NOT the termination condition satisfied DO 

// we assume that greater values of function Fit correspond to the better 
solutions, then the probability Pr{m) of the member m € P to be selected 

„ , , Fit(m) 
Pr{m) - ^ ' 



Create new members by applying crossover and/or mutation to the selected 
members; 

Generate a new population by replacing members of the current population 
by the new ones; 

Recompute fitness values; 

END WHILE LOOP 



Figure 3: Structure of the standard Genetic Algorithm 



interesting problem which we are going to address in but our preliminary 
experiments show that the following set gives the best results up to date. 

Let X = {xi, ...,Xn} and F = Fn{X). Denote by T = T„ the following set 
of Whitehead automorphisms: 
{Wl) Xi x~^, Xl Xl, 
{W2) Xi xf^Xi, Xl -> Xl, 

{W3) Xi Xixf^, Xl ^ Xl, 
(W4) Xi xJ^XiXj, Xl Xl, 
where i ^ j and i ^ I. 

We call T the restricted set of Whitehead transformations. It follows from 
[Tl) that T generates Aut{F). Hence any solution to Problem A can be repre- 
sented by a finite sequence of transformations from T. Notice that T has much 
fewer elements than r2„: 

|T| = dn"^ -4n. 

We define the search space iS as the set of all finite sequences ^ =< ti, . . . ,ts > 
of transformations from T. For such m and a tuple U £ F'^ we define U fi = 

Utl...ts. 

At the beginning the algorithm generates an initial population by randomly 
selecting members. How to choose the size of the initial (and all other) popu- 
lation is a non-trivial matter. It is clear that bigger the size - larger the search 
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space which is explored in one generation. But the trade off is that we may be 
spending too much time evaluating fitness value of members of the population. 
We do not know the optimal size of the population, but populations with 50 
members seem to give satisfactory results. 

3.2 Evaluation methods 

Fitness function Fit provides a mechanism to assess members of a given popu- 
lation P. 

Recall that the aim of GWA is to find a sequence of transformations fj, = 
{ti, . . . , ts), ti e T, such that 

U fx = Umin 

for a given input U £ F^ . So members of a given population P with smaller 
total length \Uijl\ are closer to a solution, i.e., "fitter", than the other members. 
Therefore we define the fitness function Fit as 

Fit{ii) = max{|[/A|} - \Uii\. 

Observe, that members with higher fitness values are closer to a solution Umin 
with respect to the metric on the graph T{F, k, T). In fact, we have two different 
implementations of the evaluation criterion: the one as above, and another one 
in which a word is considered as a cyclic word, so we evaluate fitness values of 
cyclic permutations of UX. 

3.3 Termination condition 

Termination condition is a tool to check whether a given population contains a 
solution to the problem or not. 

In the case of Whitehead method there are several ways to define a termi- 
nation condition. 

TI) Once a new population P„ has been defined and all members of it have 
been evaluated one may check whether or not P„ contains a solution to Problem 
A. To this end one can run "Elementary Length Reduction" routine on U^.* for 
each fittest member fi* G P„ until Umin is found. Theoretically, it is a good 
termination condition, but, as we have mentioned already, to run ELR might 
be very costly. 

T2) If for a given tuple U we know in advance the length of a minimal tuple 
|C^mm| ( for example, when [/ is a part of a basis of F), then we define another 
(fast) termination condition as \UiJ.*\ = \Umin\ for some fittest member /x* S P„. 

T3) Suppose now that we do not know |f/„im| in advance, but we know 
the expected number of populations, say E — E{U), (or some estimates for it) 
which is required for the genetic algorithm GWA to find Umin when starting on 
a tuple U. In this case we can use the following strategy: if the algorithm keeps 
working without improving on the fitness value Fit{fi*) of the fittest members 
fj,* for long enough, say for the last pE generations (where p > 1 is a fixed 
constant), then it halts and gives Ufx* for some fittest /x* as an outcome. 
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If the number E = E{U) is sufficiently small this termination condition 
could be efficient enough. Below, we will describe some techniques and numerical 
results on how one can estimate the number E{U). Of course, in this case there is 
no guarantee that the tuple U ^l* is indeed minimal. We refer to such termination 
conditions as to heuristic ones, while the condition Tl is deterministic. 

T4) One can combine conditions T3 and Tl in the following way. The 
algorithm uses the heuristic termination condition T3 and then checks (using 
Tl) whether or not the output U n* is indeed minimal. It is less costly then Tl 
(since we do not apply Tl at every generation) and it is more costly then T3. 

3.4 Stochastic operators 

There are five basic random operators that where used in the algorithm. 

3.4.1 One point crossover 

Let ^1 =< tl, te > and /12 =< si, s/ > be two members of a population P„ 
which are choosen with respect to some selection method. Given two random 
numbers < p < e and < q < I the algorithm constructs two offsprings oi 
and 02 by recombination as follows: 

Oi =< tl, ip_i, Sg, S/ >, O2 =< Si, Sq-i, ip, te > • 

3.4.2 Mutations 

The other four operators Matt, Mins, M^ei, Mrep act on a single member of a 
population and are usually called mutations . They attach, insert, delete, or 
replace some transformation in a member. Namely, let ^ =< ti, ...,ti > be a 
member of a population. Then: 

Matt attaches a random transformation s E T 

Matt ■< tl, ...,ti > ^ <ti, ...,ti,s >; 

Mins inserts a random transformation s G T into a randomly chosen position i 

Mins :< tl, t; > — + < tl, ti-l, s, ti, ..,ti >; 

Mdei deletes the transformation in a randomly chosen position i 
Mdei ■< tl, tl > < tl, ti_i, ti+i, ..,ti >; 

Mrep replaces the randomly chosen ti by a randomly chosen s G T 
Mrep ■< tl, ...,ti > < tl, ti_i, s, ti+l, ..,ti > . 

Operator Matt is a special case of M^s , but it is convenient to have it as separate 
operator (see remarks in the Section Hi 5. 1(1 . 
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3.4.3 Replacement 

In this section we discuss a protocol to construct members of the next population 
Pnew from the current population P. 

First, we select randomly two members A from P. The probability to 
choose a member from P is equal to 

Pr{m) - ^ ' 



With small probability (0.10 - 0.15) we add both ji and A to an intermediate 
population Pnew Otherwise, we apply the crossover operator to /i and A and 
add the offsprings to P^ew ■ We repeat this step until we get the required number 
of members in P^^iu (i'^ our case 50). 

Secondly, to every member m £ Pnew ^-Pply ^ random mutation M with 
probability 0.85 and add the altered member to the new population Pnew The 
choice of M is governed by the corresponding probabilities pi\f. Otherwise (with 
probability 0.15) we add the member m to Pnew unchanged. We refer to Section 
13. 5. II for a detailed discussion of our choice of the probabilities pm- 

In addition the solution with the highest fitness value among all previously 
occurred solutions is always added to the new population (replacing a weakest 
one). This implies that if we denote by fin one of the fittest members of a 
population P„ then 

|t//io| > |C/ml > ••• 



3.5 Some important features of the algorithm 
3.5.1 Precise solutions and local search 

It has been shown that different heuristics and randomized methods can be 
combined together, often resulting in more efhcicnt hybrid algorithms. Genetic 
algorithms are good in covering large areas of the search space. However, they 
may fail when a more thorough trace of a local neighborhood is required. In 
case of symbolic computations this becomes an important issue since we are 
looking for an exact solution, not an approximate one. Even if the current best 
member of a population is one step away from the optimum it might take some 
time for the standard genetic algorithm to find it. In our case, experiments show 
that the standard genetic algorithms can quickly reach the neighborhood of the 
optimum, but it may be stuck being unable to hit the right solution. To avoid 
that one could add a variation of the local search procedures to the standard 
genetic algorithm. 

In GWA some kind of gradient descent procedure was implicitly introduced 
via mutation operators. Observe, that in general, if M ^ Matt then for a given 
member /z the tuple UM{^) lies far apart from [//i in the graph T{F,k,T). 
However, the mutation Matt always gives a tuple UMatti/J-) at distance 1 from 
U II in the graph r{F,k,T). Therefore, the greater chance to apply Matt, the 
more neighbors of Ufi we can explore. It was shown experimentally that GWA 
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performs much better when Matt has a greater chance to occur. We used PM^tt = 
0.7, and PM = 0.1 for M ^ Matt- 

3.5.2 Substitution Method 

One of the major concerns when deahng with a search problem is that the 
algorithm may fall into a local minimum. Fortunately, Theorem ^ shows that 
every local minimum of the fitness function Fit is, in fact, a global one. This 
allows one to introduce another operator, which we call Substitution, and which 
is used to speed up the convergence of the algorithm. 

Suppose that the algorithm found a member /i„ £ P„ which is fitter than 
all the members of the previous population P„_i (a genetic variation of ELR 
routine). Then we want our algorithm to focus more on the tuple Ufi rather 
then to spread its own resources for useless search elsewhere. To this end, 
we stop the algorithm and restart it replacing the initial tuple U with the 
tuple J7/i (of course, memorizing the sequence fi). That is a genetic variation 
of the Whitehead's gradient descent (see Section I2.2|l . This simple method 
has tremendously improved the performance of the algorithm. In a sense, this 
substitution turns GWA into an algorithm which solves a sequence of Length 
Reduction Problems. 

4 Experiments and Results 

Let F — Fr{X) be a free group of rank r with basis X. For simplicity we 
describe here only experiments with Whitehead algorithms on inputs from F 
(not arbitrary /c-tuples from F''). Moreover, in the present paper we focus only 
on the time-complexity of Problem A, leaving discussion on Problem B for the 
future. In fact, we discuss mostly the length reduction problem LRP, as a more 
fundamental problem. In our experiments we choose ranks r = 2, 5, 10, 15, 20. 
Before we going into details it is worthwhile to discuss a few basic problems on 
statistical analysis of experiments with infinite groups. 

4.1 Experimenting with infinite groups. 

In this section we discuss briefly several general problems arising in experiments 
with infinite groups. 

Let A be an algorithm for computing with elements from a free group F = 
Fr{X). Suppose that the set of all possible inputs for A is an infinite subset 

5 G F. Statistical analysis of experiments with A involves three basic parts: 

• creating a finite set of test inputs Stest C S, 

• running A on inputs from Stest and collecting outputs, 

• statistical analysis of the resulting data. 
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The following is the main concern when creating Stest- 

Random Generation of the test data: How one can generate pseudo- 
randomly a finite subset Stest C S which represents adequately the whole set 
S? 

The notion of a random element in F, or in S, depends on a chosen measure on 
F. Since F is infinite, elements in F are not uniformly distributed. The problem 
cannot be solved just by replacing F with a finite ball _B„, of all elements in 
F of length at most n, for a big number n. Indeed, firstly, the ball Bn is too 
big for any practical computations; secondly, from group-theoretic view-point 
elements in i?„ usually are not uniformly distributed. We refer to pOj and |2] for 
a thorough discussion of this matter. 

The main problem when collecting results of the runs of the algorithm A 
on inputs from Stest is pure practical: our resources in time and computer 
power are limited, so the set Stest has to be as small as possible, though still 
representative. 

Minimizing the cost: How to make the set Stest as small as possible, but 
still representative? 

Below we used the following technique to ensure representativeness of Stest- 
Assume we have already a procedure to generate pseudo-random elements in 
S. Let xi^test) be some computable numerical characteristic of the set Stest, 
which represents a "feature" that we are going to test. Fix a small real number 
£ > 0. We start creating Stest by generating an initial subset Sq d S which we 
can easily handle within our recourses. Now we enlarge the set to a new set 
5*1 by pseudo-randomly adding reasonably many of new elements from S, and 
check whether the equality 

\x{So) - x{Si)\ < e 

holds or not. We repeat this procedure until the equality holds for N consecutive 
steps Si, Si+i, . . . , S'i+AT, where iV is a fixed preassign number. In this event we 
stop and take Stest = Si. 

Statistical analysis of the experiments depends on the features that are going 
to be tested (average running time of the algorithm, expected frequencies of 
outputs of a given type, etc.). For example, estimations of the running time 
of the algorithm A depends on how we measure "complexity" or "size" of the 
inputs s G S. For example, it turned out that the running time of the Whitehead 
algorithm GWA does not depend essentially on the length of an input word s, 
so it would be meaningless to measure the time complexity of DWA in terms of 
the length of s (as is customary in computer science). So the following problem 
is crucial here. 

Finding adequate complexity functions: Find a complexity function 
on S which is compatible with the algorithm A. 

Below we suggest some particular ways to approach all these problems in 
the case of the Whitehead's algorithms. 
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4.2 Random elements in F and Whitehead algorithms 

It seems that the most obvious choice for the set Stest to test performance 
of various Whitehead algorithms would be a finite set S'f of randomly chosen 
elements from F. It turned out, that this choice is not good at all since with a 
high probability a random element in F is already minimal. Nevertheless, the 
set Sf plays an important part in the sequel as a base for other constructions. 

A random element w in F = Fr{X) can be produced as the result of a no- 
return simple random walk on the Cayley graph of F with respect to the set of 
generators X (see (2] for details). In practice this amounts to a pseudo-random 
choice of a number I (the length of w), and a pseudo-random sequence j/i , . . . , y; 
of elements i/i G X^^ such that yi ^ y^^i, where yi is chosen randomly from 
X^^ with probability l/2r, and all others are chosen randomly with probability 
1/(2?- — 1). It is convenient to structure the set Sp as follows: 



where Wi^i is a random word of length I and L, K are parameters. 

To find all minimal elements in Sp we run the standard deterministic White- 
head algorithm DWA on every s ^ Sp- Since DWA is very slow for big ranks 
we experimented with free groups F = F^ for r = 3, 4, 5. In Figure 01 we present 
the fractions of minimal elements among all elements of a given length in Sp- 
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Figure 4: Fractions of Whitehead-minimal elements in a free group Fr, r = 
3,4,5. 
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This experimental data leads to the following statement. 

Conclusion 2 Almost all elements in Fr,r > 2 are Whitehead minimal. 

We refer to Section for a rigorous formulation of the corresponding mathe- 
matical statement. 

The running time T^wAiw) of the standard Whitehead algorithm DWA on 
a minimal input w is very easy to estimate. Indeed, in this case DWA applies 
the substitution routine SR for every Whitehead automorphism of the second 
type. Since there are such automorphisms (see Section |2.2|I . then 



The time spent by the genetic algorithm GWA on a random input w depends 
solely on the build-in termination condition: if it is heuristic (see Section I575|) . 
then GWA stops after pE{w) iterations, where E{w) is the expected running 
time for GWA on the input w; if it is deterministic then again it takes Ar steps 
for GWA to halt. This shows that the set Sp does not really test how GWA 
works, instead, it tests only the termination conditions. 

We summarize the discussion above in the following statement. 

Conclusion 3 The time- complexity of Whitehead algorithms DWA and GWA 
on generic inputs from Sp is easy to estimate. The set Sp does not provide any 
means to compare algorithms DWA and GWA. 

It follows that one has to test Whitehead algorithms on inputs w E F which are 
non-minimal. 

4.3 Complexity of Length Reduction Problem 

In this section we test our genetic algorithm GWA on the length reduction 
problem LRP, which is the main component of the Whitehead's Method. 

To this end we generate a finite set SNMinif) of non-minimal elements in a 
free group Fr, for r = 2, 5, 10, 15, 20, by applying random Whitehead automor- 
phisms to elements form Sp- More precisely, put 



where (pi is a randomly chosen Whitehead automorphism of type 2), Wi^i € Sp 
with \wi^i\ < \wi^iipi\. Since almost all elements from Sp are minimal it is easy 
to generate a set like SNMinir). Notice that elements in SNMin{r) are not 
randomly chosen non-minimal elements from F, they are non-minimal elements 
at distance 1 from minimal ones. We will have to say more about this in the 
next section. 

The results of our experiments indicate that the average time required for 
GWA to find a length reducing Whitehead automorphism for a given non- 
minimal element w G SNMin{r) does not depend significantly on the length 
of the word w. 



Ar < Tp)WAiw) < c • 




; l<i<K 
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Let Tgen {w) be the number of iterations required for GWA to find a length- 
reducing automorphism for a given w G F during a particular run of GWA on 
the input w. We compute the average value of Tg^niw) on inputs w G SMMinir) 
of a given " size" . If the length of a word w is taken as its size then we obtain 
the following time complexity function with respect to the test data SNMinif)'- 

Tr{m) = — ^ TgenH 

where S^, = {w & SNMin{r) \ \w\ = m}. 

Values of Tr{m) are presented in Figure |31 for free groups with r = 
2,3,5,10,15,20. 
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Figure 5: Values of T, 5 = 5*1. 

We can see from the graphs that the function Tr grows for small values of 
\w\ and then stabilizes at some constant value T*. This shows that T,. does not 
depend on the word's length and depends only on the rank r (for long enough 
words w). 

In Table |21 we give correlation coefficients between Tr and |w| for r = 
2, 5, 10, 15, 20, which are sufficiently small. 

We summarize the discussion above in the following statements. 

Conclusion 4 The number of iterations required for GWA to find a length 
reducing automorphism for a given non-minimal input w does not depend on 
the length of \w\, it depends only on the rank r (for long enough input words). 
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-0.012 


-0.016 


0.015 


0.03 


0.072 


\w\ > 100 


-0.011 


-0.03 


-0.019 


-0.025 


-0.005 



Table 2: Correlation between \w\ and T^. 



Recall that a similar phenomena was observed for the deterministic Whitehead's 
algorithm in Conclusion ^ 

Conclusion 5 One has to replace the length size function by a more sensitive 
"size" function when measuring the time- complexity of the Length Reduction 
Problem. 

Conclusion 6 For each free group the time- complexity function T^. is bounded 
from above by some constant value T* . 

We can try to estimate the value T* as the expected number of generations 

w&Sn Mi-n{r) 

required for GWA to find a length-reducing automorphism for generic non- 
minimal elements from F^. Notice, that we use E{r) in the heuristic termination 
condition TC3 (see Section rO| for the algorithm GWA. 

Of course, the conclusions above are not mathematical theorems, they are 
just empirical phenomena that can be seen from our experiments based on 
the test set SNMin{r). It is important to make sure that the set SNMin{i") is 
sufficiently representative. 

To this end, we made sure, firstly, that the distributions of lengths of words 
from the set SiqMin{T) are similar for different ranks (using the variable /). 
Secondly, our choice of the parameter K in the construction of SNMin{f) en- 
sures representativeness of the test data with respect to the characteristic E{r). 
Namely, we select K such that for larger values K' > K the corresponding 
value Eji'ir) does not differ significantly from Exir) (here Exir) is the value 
corresponding to the data set SNMin{i') with the parameter K). 

Values of E{r) for different K and r are given in Table 01 

4.4 Complexity functions 

In this section we discuss possible complexity, or size, functions suitable to 
estimate the time-complexity of different variations of Whitehead algorithms. 
Below we suggest a new complexity function based on the distance in the White- 
head graph. 

Let F = Fr,Y C Aut{F) a set of generators of the group Aut{F), r(F, Y) = 
T{F,1,Y) the Whitehead graph on F relative to Y (see Section For a 
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K 


m 


m 
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E{15) 
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2.43 


6.55 


11.48 


16.98 


200 


1.009 


2.42 


6.44 


11.47 


17.17 


300 


1.008 


2.42 


6.43 


11.39 


17.3 


400 


1.007 


2.39 


6.43 


11.40 


17.38 


500 


1.007 


2.44 


6.43 


11.39 


17.4 



Table 3: EK{r) for different values of K and r. 

word w € F we define WCy (w) as a minimal number of automorphisms from 
Y^^ required to reduce w to a minimal one Wmin- Notice that WCy{w) is 
the length of a geodesic path in T{F,Y) from w to some w„iin- If Y is the 
set of all Whitehead automorphism fir then we call WCy{w) the Whitehead's 
complexity of w and denote it by WC{w). Similarly, one can introduce the 
Nielsen's complexity of w, T-complexity, etc. In this context minimal elements 
have zero Whitehead complexity. 

Claim The Whitehead's complexity function WC{w) is an adequate com- 
plexity function to measure performance of various modifications of the White- 
head's algorithm. 

Indeed, let be a class of Whitehead's-type algorithms which use an arbi- 
trary generating set Y C fir oi Whitehead automorphisms to find a minimal 
word Wmin for an input word w. The best possible algorithm of this type is 
the non-deterministic Whitehead algorithm NDWA with an oracle that at each 
step i gives a length reducing automorphism ti d Y such that \wti . ..ti\ < 
\wti . . .ti-i\. Clearly, it takes WCy{w) steps for NDWA to produce Wmin- 
Thus, measuring efficiency of an algorithm A € IC in terms of CWy gives us a 
comparison of performance of A to the performance of the best possible algo- 
rithm in the class. 

Remark 1 Notice that the set SNMinif) is a pseudo-random sampling of el- 
ements w €z Fr with WC{w) = 1. This explains the behavior of the function 
Tr in Figure\^ The number of iterations required for GWA to find a length 
reducing automorphism depends on Whitehead complexity not on the lengths of 
the words. 

Of course, WC complexity is mostly a theoretical tool, since, in general, it is 
harder to compute WC{w) then to find Wmin- It follows from the Whitehead's 
fundamental theorem that WC{w) < \w\ for every w G F. In Table^we collect 
some experimental results on relation between WC{w) and \w\. 

This leads to the following 

Conclusion 7 Let Wm — {w G Fr \ WC(w) — m}. Then there exists a 
constant Cr such that 

\w\ > cT 

for the "most" elements in Wm- 
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\wt\/\w\, t e T 


1.06 
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1.07 


1.06 



Table 4: WC{w) vs 



For the stochastic algorithm GWA one can define an average time complexity 
function Tr^Y(m) with respect to the test data SNMinif) and the "size" function 
WCy as follows: 

Tr,Y{m) = V Tgen{w) 

where Sm = {w G SNMin I WCy{w) — m}. 

Conjecture 1 The average number of iterations required for GWA to find Wmm 
on an input w ^ F depends only on WC(w) and the rank of the group F. 

We discuss some experiments made to verify Conjecture ^ in Section l4.5l 
4.5 Experiments with primitive elements 

In this section we discuss results of experiments with primitive elements. Re- 
call that elements from the orbit Orb{xi), where Xi € X, are called primitive 
in F{X). Experimenting with primitive elements has several important advan- 
tages: 

• in general, primitive elements w require long chains of Whitehead auto- 
morphisms (relative to \w\) to get to Wmim 

• one can easily generate pseudo-random primitive elements, 

• the genetic algorithm GWA has a perfect termination condition \wmin \ — 1 
for primitive elements w. 

Thus, primitive elements provide an optimal test data to compare various modi- 
fications of Whitehead algorithm and to verify (experimentally) the conjectures 
and conclusions stated in the previous sections. 

We generate primitive elements in the form x^p, where x is a random element 
from X and ip is a, random automorphism of F given by a freely reduced product 
= ti . . .ti oi I randomly and uniformly chosen automorphisms from T with 
ti ^ t'^^^ (see the comments for Sp)- The number I = l{ip) is called the length 
of ip. 

In general, a random automorphism p with respect to a fixed finite set T of 
generators of the group Aut{F) can be generated as the result of a no-return 
simple random walk on the Cayley graph r{Aut{F), T) of Aut{F) with respect 
to the set of generators T. Unfortunately, the structure oiV{Aut{F),T) is very 
complex, and it is hard to simulate such a random walk effectively. 
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Again, for each free group Fr (r — 2, 5, 10, 15, 20), we construct a set Sp(r) 
of test primitive elements as follows: 

L K 
1=1 i=l 

where ip'p is a random automorphism of length I. 

We use the data sets Sp{r) to verify, using independent experiments, the 
conclusions of Section lOl on the average expected time E{r) required for GWA 
to solve the length reduction problem in the group F^.. If they are true then the 
expected number of iterations Geririw) required for GWA to produce Wmin for 
a given input w & F^ satisfies the following estimate: 

Geririw) < E{r)CW{w) < E{r)\w\ (3) 

Let Qr be the fraction of such elements w in the set Sp{r) for which 
Geririw) < E{r)\w\ holds. Table [31 shows values of Qr for r — 2,5,10,20. 
We can see that Qr is closed to 1 for all tested ranks, as predicted. 

In particular, we can make the following 

Conclusion 8 The genetic algorithm GWA with the termination condition T3 
gives reliable results. 





F2 


F5 


Fio 


i^l5 


^20 


E{r) 


1 


3 


7 


12 


18 


all words 


0.93 


0.93 


0.99 


0.99 


0.99 


\w\ > 100 


1.0 


0.99 


0.99 


0.99 


1.0 



Table 5: Fraction of elements w E Sp{r) with TGenr{w) < i?(r)|?ii|. 

In constructing the set Sp{r) we select K to ensure the representativeness 
of characteristic Qr (see tabled. 
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Table 6: Values of Qr computed with different values of K. 

The data stabilizes at K ^ 500 and this is the value of K used in our 
experiments. 
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5 Time complexity of GWA 

It is not easy to estimate, or even to define, time complexity of GWA because 
of its stochastic nature. However, one can estimate the time complexity of the 
major components of GWA on each given iteration. Afterward, one may define 
a time complexity function TcwAis) as an average number of iterations required 
by GWA to find a solution starting on a given input s. 

Let GWA starts to work on an input w G F. Below we give some estimates 
for the time required for GWA to make one iteration. It is easy to see that 
the total execution time Tcmr{P) of Crossover, Mutation, and Replacement 
operators, needed to generate the a population Pnew from a given population 
P, does not depend on the length of the input w and depends only on the 
cardinality of the population P (which is fixed), and the length of members ^ 
of the current population P (here is the length of the sequence /z). Therefore, 
for some constant Ccmr the following estimate holds 

Tcmr{P) < Ccmr ■ Mp 

where Mp = max{|^| | fi £ P}. 

To compute Fit{ii) for a given fj, £ P it requires to run the substitution 
routine SR on the input wfi. Since < 3|?«| for any restricted Whitehead 
automorphism t £ T one has |w/i| < 3l^l|w| for each ji £ P. Hence the execution 
time Tpit required to compute Fit{^) can be bounded from above by 

Tp,t < Cpu ■ \wfi\ < Cp,t ■ 3*'^ • \w\ 

This argument shows that the time Tge„(P) required for GWA to generate 
a new population from a given one P can be estimated from above by 

TgeniP) < TcMr{P) + Tpu < CcMR ' Mp + Cpu ■ 3*'^ ■ \w\. 

In fact, the estimate \wt\ < 3\w\ is very crude, as we have seen in Section ^31 
one has on average \wt\ < Cr|w| and the values of Cr are much smaller than 3 
(see Table EJ. So on average one can make the following estimate: 

TgeniP) < Ccmr ■ Mp + Cpu ■ c/'^ • \w\. 

Thus, the length of members of the current population P has crucial impact on 
the time complexity of the procedure that generates the next population. 

A priori, there are no limits on the length of the population members £ P. 
However, application of the Substitution Method f Section 13. 5. 2|l divides GWA 
into a sequence of separate runs, each of which solves the Length Reduction 
Problem for a current word Wi = wti...ti. Furthermore, our experiments 
show that to solve this problem GWA generates population members in P of 
the average length E\^\ which does not depend on the length of the input Wi, it 
depends only on the rank of F. In FigureElwe present results of our experiments 
with computing (/i S P) when running GWA on inputs w from SiqMinij')- 
In TableQwe collect average and maximal values of \^\ for inputs w £ SNMinir) 
for various ranks r. 

This experimental data allows us to state the following observed phenomena. 
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Figure 6: Values of for various word lengths: a) maximal b) average 
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Table 7: Maximal and average lengths of the population members. 

Conclusion 9 To solve the Length Reduction problem for a given non-minimal 
w F GWA generates new populations in time bounded from above by Cr\w\ 
where Cr is a constant bounded from above in the worst case by 

Cr < CcMR ■ Mp + Cpit ■ 3*^^ , 

and on average by 

Cr < CcMR ■ Mp + Cpit ■ c^^^ , 

Now we can estimate the expected time-complexity TGWAr (w) of GWA on 
an input w Cz Fr as follows; 

TGWAriw) « Geuriw) ■ average{Tgen{P)) < E{r) ■ WGt{w) ■ Gr ■ \w\. 

We conclude this section with a comment that average values of G P) 
shed some light on the average height of "picks" (see Sectional) for the set T of 
restricted Whitehead automorphisms. This topic needs a separate research and 
we plan to address this issue in the future. 

5.1 Comparison of the standard Whitehead algorithm with 
the genetic Whitehead algorithm 

In this section we compare results of our experiments with the standard White- 
head algorithm DWA and the genetic algorithm GWA. We tested these algo- 
rithms on the set Sp of pseudo-random primitive elements. 
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As we have seen in Section |31 we may estimate the expected time required for 
GWA to find a length reducing automorphism on a non-minimal input w G Fr 
as: 

Cr ■ E{r) ■ \w\. 

Recall from Section l!j . '6 . II that the expected time required for DWA to find such 
an automorphism can be estimated by 



LRr 



In Table 13 and Figure |21 we collected an experimental data on average values of 
E{r) and for various free groups Fr- It seems from our experiments that 



Cr ■ E{r) « 



for big enough r. Thus, we should expect much better performance of GWA 
than DWA on groups of higher ranks. 

In Table |S1 and Figures |7| we present results on performance comparison of 
GWA with an implementation of the standard Whitehead's algorithm DWA 
available in MAGNUS software package J^l- We run the algorithms on words 
w S Sp{r) and measured the execution time. We terminated an algorithm if 
it was unable to obtain the minimal element (of length 1) on an input w after 
being running for more then an hour. There were very few runs of DWA for 
words w G FiQ with \w\ > 100 that finished within an hour. There were no such 
runs for \w\ > 200 at all, and therefore results of these experiments are marked 
"na" (not available). 
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57 


104 


268 
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228 
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102 


268 


Time spent 




















by the standard 


0.03 


0.07 


0.18 


13.29 


27.4 


85.9 


1995 


na^ 


na 


algorithm, s 




















Time spent 




















by the genetic 


0.52 


1.2 


2.7 


1.4 


2.6 


5.6 


2.6 


6.07 


17.4 


algorithm, s 





















Table 8: Performance comparison of DWA and GWA. 



Conclusion 10 GWA performs much better than DWA in free groups Fr for 
sufficiently big r (in our experiments, r > 5) and on sufficiently long inputs ( in 
our experiments, \w\ > 10). 
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words lenglh |w| 



Figure 7: Time comparison between standard and genetic algorithms on primi- 
tive elements in a) F2, b) -F5 and c) Fiq. 

6 Mathematical problems arose from the exper- 
iments 

We believe that there must be some hidden mathematical reasons for the genetic 
algorithm GWA to perform so fast. In this section wc formulate several math- 
ematical questions which, if confirmed, would explain the robust performance 
of GWA, and lead to improved versions of the standard GWA, or to essentially 
now algorithms. Wc focus mostly on particular choices of the finite set of ini- 
tial elementary automorphisms, and geometry of connected components of the 
Whitehead graph T{Fr, 1, Or)- 

Conjecture 2 Let U G F^. Then there exists a polynomial Pr,k such that 

\OrbrmniU)\ < PrMPrmnl) 

Conjecture 3 Almost all elements in Fr,r > 2 are Whitehead minimal. 

Of course, a rigorous formulation of this conjecture has to involve some prob- 
ability measure on the free group F. One of the typical approaches to such 
problems is based on an asymptotic density on F as a measuring tool. Re- 
cently, a theoretical justification of this conjecture, relative to the asymptotic 
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density, appeared in [7]. Below we use the asymptotic density as our standard 
measuring tool, though the measures from ^ would provide more precise 
results. 

The first conjecture deals with the average complexity of the standard White- 
head's descent algorithm DWA. 

Conjecture 4 Let F = Fn be a free group of rank n, NMini C F the set of 
all non-minimal elements in F of length I. Then there is a constant LRn such 
that 



Conjecture 5 Let 
and 



Wm = e I WC{w) = to} 



Wm,c^ ={we W,r. I \W\ > C™} 

There exists a constant Cr > 1 such that 

hm = 1 

m^oo \Wm\ 

Moreover, the convergence is exponentially fast. 

Let T — Tr he the restricted set of Whitehead automorphisms of the group 
Fr defined in Section mi Recall that 

|T| = 5r^ - 4r. 

We say that u G Orb{w) is a local minimum (with respect to the length function), 
if for u 7^ Wmin but \ut\ > \u\ for any t G T. If u is a local minimum in Orb{w) 
then a sequence of moves ii, . . . , ife such that \uti . . . ife| < |u| and k is minimal 
with this property is called a pick at u. We say that the Whitehead's descent 
algorithm with respect to T (see Section I2.2|l is monotone on w if it does not 
encounter any local minima. 

Conjecture 6 For "most" non-minimal elements w £ Fr the Whitehead's de- 
scent algorithm with respect to T is monotone. More precisely, let NMini C Fr 
be the set of all non-minimal elements in Fr of length I, and NMini^x is the 
subset of those for which the Whitehead's descent algorithm with respect to T is 
monotone. Then 

\NMini^T\ , 
hm — — = 1 

m^oo \NMini\ 

Moreover, the convergence is exponentially fast. 

Observe, that if Conjecture |H1 holds then on most inputs w € NMin C Fr 
the Whitehead's descent algorithm with respect to T requires at most C ■ r^ ■ 
WC{w) ■ \w\ steps to find Wmin- 

Now we are in a position to formulate the following conjecture 
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Conjecture 7 The time complexity (or, at least, the average-case time com- 
plexity) of the Problem A on inputs w € NMin C Fr is bounded from above 
by 

P{r)WC{w)\w\ 
where P{r) is a fixed polynomial. 

Problem 1 What is geometry of the graph T{Fr,l,Clr)? In particular, are 
connected components ofT{Fr,l,Q,r) hyperbolic? 

If uncovered, the geometric properties of the graphs T{Fr, 1, O^.) should pro- 
vide fast deterministic algorithms for Problems A and B. 
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